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Abstract 

Background: In early-stage of cancer, primary treatment can be considered as effective at eliminating the tumor for 
a non-negligible proportion of patients whereas for the others it leads to a lower tumor burden and thereby 
potentially prolonged survival. In this mixed population of patients, it is of great interest to detect complex differences 
in survival distributions associated with molecular markers that potentially activate latent downstream pathways 
implicated in tumor progression. 

Method: We propose a novel model-based score test designed for identifying molecular markers with complex 
effects on survival in early-stage cancer. From a biological point of view, the proposed score test allows to detect 
complex changes in the survival distributions linked to either the tumor burden or its dynamic growth. 

Results: Simulation results show that the proposed statistic is powerful at identifying departure from the null 
hypothesis of no survival difference. The practical use of the proposed statistic is exemplified by analyzing the 
prognostic impact of Kras mutation in early-stage of lung adenocarcinomas. This analysis leads to the conclusion that 
Kras mutation has a significant negative prognostic impact on survival. Moreover, it emphasizes that the complex role 
of Kras mutation on survival would have been overlooked by considering results from the classical logrank test. 

Conclusion: With the growing number of biological markers to be tested in early-stage cancer, the proposed score 
test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns. 
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Background 

Entering the era of so-called personalized oncology 
through the growing use of molecular markers, one of 
the main questions concerns their capacities to refine 
patient prognosis beyond classical bio-clinical risk factors. 
From clinically and pathologically well-defined group of 
patients, these markers need to demonstrate their abilities 
to reveal heterogeneity in survival times among patients. 
For patients with early-stage of cancer treated with cura- 
tive therapy, the problem is particularly challenging since 



"Correspondence: philippe.broet@inserm.fr 

1 Assistance Publique-Hopitaux de Paris, Hopital Paul Brousse, Villejuif, France 
^Faculty of Medicine, University Paris-Sud, Paris, France 
3 INSERM,UMR-669, Villejuif, France 

Full list of author information is available at the end of the article 



molecular markers often reflect complex interplay of dow- 
stream pathways that drive either the remaining tumor 
burden or its dynamic growth. 

Cure rate models, especially those with biological inter- 
pretation, are well-suited for analyzing such data. These 
models are formulated by assuming that the popula- 
tion under study is composed of two subpopulations of 
patients, those who have no persitant tumor (sometimes 
referred as long-term survivors or cured patients) and 
those who have persistent tumor burden and are suscepti- 
ble of experiencing a disease recurrence. In the literature, 
the oldest approach relies on two-component mixture 
models which incorporate a cure fraction in a paramet- 
ric or semi-parametric framework (for a review, see [1]). 
A different approach, which defines the cumulative haz- 
ard as a bounded increasing positive function and relies 
on a mechanistic model of cancer, has been introduced 
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by Yakovlev et al [2-4]. This cure rate model (some- 
times referred as promotion time cure model [5]) defines 
the improper survival distribution whereby each individ- 
ual is exposed to recurrences that arise from unobserv- 
able tumor clonogens surviving the primary treatment. 
A clonogen is denned as a cell (or a group of geno- 
typically identical cells) that has the capacity to divide, 
disseminate and proliferate indefinitively for giving rise to 
local or distant tumor recurrence. Each surviving clono- 
gen has its own dynamic growth and the tumor is detected 
as soon as any one of the clonogens is able to pro- 
duce a clinically overt tumor. The elapsed time between 
the end of the primary treatment and the clinical dis- 
ease corresponds to the time-to-event. Assuming relevant 
probability distributions for the number of (unobserved) 
clonogens and for the clonogenics time-to-event, one can 
deduce the marginal (or population) survival distribution. 
From biological considerations, the Poisson distribution 
has been the classical choice for the distribution of the 
number of clonogens [4,5]. Relying on this latter mod- 
elling assumption, marginal semi-parametric cure models 
have been proposed from which different statistics have 
been deduced to test for identity of the survival curves 
[6-8]. However, a limitation of the Poisson distribution, 
on which these models are built, is that it is not flexible 
enough for allowing, among uncured patients, different 
probability distribution of the number of surviving clono- 
gens. In particular, if the probability of being cured (no 
clonogen) after the primary treatment is identical across 
all patients, it necessarily implies a same distribution 
for the number of surviving clonogens among uncured 
patients. In this context and from a Bayesian perspective, 
Yin et al. [9] have proposed a family of transformation 
cure models that gives more flexibility for modelling sur- 
vival curves and includes the two-component mixture 
model and the Poisson cure model as special cases [9,10]. 
However, this family does not provide an easy biologi- 
cal interpretation regarding changes in the cure fraction, 
the distribution of surviving clonogens and the tumor 
progression. 

In this work and based on an alternative mechanistic 
cure rate model, we propose a novel score test statistic for 
detecting molecular markers associated with complex sur- 
vival patterns in early-stage cancer. After introducing an 
alternative semi-parametric cure rate model that allows 
to describe changes in the survival distributions linked to 
either the tumor burden (cure rate fraction and surviving 
clonogens distribution) or its dynamic growth (time-to 
event distribution), a model-based score test is proposed. 
This novel score test is designed for detecting molecu- 
lar markers associated with complex survival patterns in 
early-stage cancer. We illustrate the clinical interest of this 
statistic by investigating the impact on survival distribu- 
tions of genetic (Kras mutation), genomic (chromosomal 



aberration) and histopathologic markers among patients 
with early-stage lung adenocarcinoma. 

Methods 

Modeling background 

Here, we focus on a binary variable which allocates the 
patients in two groups i = 0, 1 (with Yi{ subjects in group 
i (n = no + n\)). For each patient Gj denotes the indi- 
cator variable of group 1. For the lung cancer dataset, this 
variable indicates the presence/absence of Kras mutation. 
In the following, a tumor is modeled as a set of clono- 
gens, with identical properties and independent evolution. 
For each patient j in group /, let the random variables T\- 

associated to the k th latent (unobservable) clonogen, be 
the time-to-progression until a detectable recurrence with 
(clonogenic) survival function A{ (t). Let Ky be the number 
of latent clonogens that survived the treatment for patient 
; in group /. We suppose that for the two groups, Ky is 
distributed with probability mass function Oo, $1 and Ky 
is supposed to be independent of T\-, Let denote T*j = 

mini<k<Kij(Tij) the time-to-event of the earliest clonogen 
and Qj the censoring time. We assume that Tfj and Qj 
satisfy the condition of independent censoring [10]. For 
each subject, the data consist of Xy = min{Tf- } Qj) the 
observed time of follow-up, 8y = l(x^=lf.) tne indicator of 
the occurence of the earliest clonogen and Gj the indica- 
tor variable of group 1. We also denote Yy(t) = l(t<Xjj) the 
indicator of being at risk for an event at time t. 

For each patient j in group i with Ky latent clono- 
gens, the conditional (patient-specific) survival function is 
expressed as: 

SijitlKij) = Pr (l| > t) 

= Pr(l| >t,...,lf >t) =Ai(tf'> 

Thus, the marginal (population) survival function (for 
group i) is given by: 

oo oo 

si{t) = Tsi,(t\km(k) = TMtfMk) 

Assuming that the number of clonogens in treated 
tumors is following for the two groups a Poisson distribu- 
tion [2-4], the marginal distribution is such as : Si(t) = 
exp {— [1 — where (i.e. the Poisson parame- 

ter) is the mean number of clonogens and exp(-^i) is the 
probability of having no surviving clonogen (cure frac- 
tion). From this framework, one can modelize short and 
long-term effects of a marker [6-8]. The short-term effect 
(linked to A((t)) formulates the shape of the difference 
between the (clonogenic) latent survival functions. The 
long-term effect (linked to §;) quantifies the difference in 
the long-term survivors rates. It is straighforward to see 
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that a same cure fraction between the different groups 
(no long-term effect) implies a same distribution for the 
number of surviving clonogens. 

In the following, we consider a family of discrete dis- 
tributions proposed by Katz [11] for which the Poisson 
distribution is considered as the benchmark model (null 
model). This family allows to consider different condi- 
tional probability mass functions for the number of sur- 
viving clonogens (Pr$.(/<r^ = u\Ky > 0)) with a same cure 
fraction Pr$.(K// = 0). 

Distribution of the number of clonogens 

We recall that Katz [11,12] proposed a family of dis- 
crete distributions with the property that successive count 
probabilities satisfy the following first-order recurrence 
formula: 

Pr(* +1) co + Ox 
Pr(#) 1+x 

where co > 0 and 0 < 1. 

Katz showed that the probability generating function is 
such as: 

g(s;co,6) = [(l-tf)" 1 x (l-0s)]~* for (9 ^0 
g(s; co, 0) = exp [-co (1 - s)] for 0 = 0 

with |s| < 1. 

It follows that the initial probability is equal to: Pr(0) = 
p 0 = (l-O)f for 0 ^ 0 (p 0 = e~ M for 0 = 0). 
Thus, this family allows us to consider different condi- 
tional probability mass functions (Pr:(x\x > 0)) with a 
same po. 

Moreover, it is worth noting that co = /jL 2 /<t 2 and 0 is 
linked to the dispersion index (variance-to-mean ratio) 
such as:a 2 //x = (1 — 0)~ l . This family covers various 
distributions with the property of being under-dispersed 
(0 < 0), over-dispersed (0 > 0) or equi-dispersed (0 = 
0). This latter case corresponds to the Poisson distribu- 
tion. For 0 < 0, it includes Binomial distributions (N = 
-co/ 6; p = 0/ (0 — 1)) whereas for 6 > 0 it includes Neg- 
ative Binomial distributions (u = co/0; P = 0 / (1 — 0)). 

Relying on this family of distributions, we propose to 
consider the following semi-parametric cure model 

Improper survival function 

According to the above results, a semi-parametric 
improper cure model, which encompasses the Poisson 
cure model, is obtained as follows: 
The marginal survival function is defined such as: 

oo oo 

Si(t) = TSij(t\k)Mk) = TMtfMk) 

k=0 k=0 

where Pr$.(/c) is the Katz probability mass function and 
Ai(t) is a decreasing function such as 1 > Ai(t) > 0. 



Thus, we have the following general survival functions 
in group i = 0, 1: 

S 0 (t) = exp{-co 0 [l-A 0 (t)]} 

_on (1) 

s 1 (t) = [(i-oy 1 x(i-oA 1 (t))] ° 

The corresponding cumulative hazard function and 
hazard function are noted A/(£) = —log [5/(0] and 
X[(t) = ^Ai(t), respectively. It is straighforward to see 
that So(t) and S\(t) are improper survival functions with 
cure fractions 5o(oo + ) = e 0)0 and 5i(oo+) = (1 — 0) o , 
respectively. Here, Ao(t) and A\(t) are arbitrary latent sur- 
vival functions decreasing with time from one to zero. 
We can give different shapes by modeling the function 
suchasAi(£) = Ao (t, a) where Do (t, a ) = — ^Ao(t,a) 
refers to the corresponding density function and a is a 
real parameter with Aq(£, 0) = Ao(t). In the following 
section, we will consider a classical log-linear relationship 
such as Ao(t, a) = Ao(t) e . Thus, the parameter a formu- 
lates the shape of the difference between the clonogenic 
survival functions for group 0 and 1. When a > 0 (resp. 
a < 0) patients belonging to groupe 1 have earlier (resp. 
later) relapses as compared to group 0. Here, the Poisson 
model is considered as the reference one which leads to 
the marginal survival So(t). Changes in the distribution of 
the number of clonogens are interpreted with regard to 
this model. It is worth noting that the Poisson cure model 
can also be considered as representing an homogeneous 
multi-clonogenic model and departure from this model 
can be interpreted as either an under-dispersed (sin- 
gle clonogenic model) or over-dispersed (heterogeneous 
multi-clonogenic model) situation. 

It is useful for the following to write the ratio of the haz- 
ard functions Xo(t) and Xi(t) deduced from model (1) so 
that: 

Xiit) =X 0 (t) expflog [cox/coo] + log [D 0 (t, a) /D 0 (t)] 
-log[l-0Ao(f,a)]}. 

In the following, we denote y = log [coi/coo]. From a 
biological perspective, belonging to group 1 is associated 
with changes in the cure fraction, the conditional distri- 
bution of the number of surviving clonogens or the latent 
survival (tumor progression) through the parameters of 
interest y, 0 and a. If a = 0, the latent (clonogenic) sur- 
vival curves are identical between the two groups what- 
ever the distribution of the number of clonogens. If 0 =0, 
there is a same probability distribution family (Poisson) 
for the number of clonogens whatever the dynamic of the 
clonogens ( a) or the cure fraction (y). This latter case cor- 
responds to the classical Poisson cure rate model. If 0 = 
a = 0, it corresponds to the proportional hazards hypoth- 
esis whereby the relative risk is constant over time but 
the improper survival distributions converges to different 
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cure fractions. Moreover, it should be noted that using 
a different parametrization and constraining the quantity 
0/coi to lie on [0, 1] leads to the transformation cure model 
[9]. 

In this work, the general null hypothesis to be tested Ho : 
0 = a = y = 0 is the lack of survival difference between 
the two groups. 

The proposed statistic 

In the following, we derive a score statistic which is 
optimal under a classical log-linear relationship such as 
Ao (t, a) = Ao (t) e so that the ratio of the hazard functions 
between the two groups is such as: 

ki(t) = k 0 (t) exp | y + a + log [A 0 (t)] (f - l) 

-logfl-Motf) 6 *]} 

Thus, the log-partial likelihood derived under this mul- 
tiplicative model is such as: 



logI(0,a,y;G) =^ =1 8j\ v(tj)Gj 



-log 



.k=i 



where v(t) = y + a + log[A 0 (f)] (e a - 1) - log[l- 

0A 0 (ty a ] 

The score vector is derived from the first derivative 
of the log-partial likelihood with respect to 0, a and y 
evaluated under Ho : 0 = a = y = 0. 

The score vector is deduced under the null hypothesis 
(Ho : 0 = a = y = 0). The three components are as 
follows: 



7=1 



1 + 



log(l-^)) 



£ Y k (tf) G k 
k=l 



k=\ 



Vh 0 , 



J2 s i 

7=1 



Ao(5>] 
&»o J 



E ** fe) ^ 

k=l 

k=l 



7=1 



E ^ (*;) ^ 
/c=l 

k=l 



For computing the score statistic, we should substi- 
tute Ao(£) and coo by efficient estimators Ao(t) and cbo 
computed under the null hypothesis Ho. Here, Ao(£) = 
E^i/oCLi^Wl'^^W^here^a) = l {Xj <t,8j=i} 
is the left-continuous version of the Nelson- Aalen esti- 
mator for the cumulative hazard [13] obtained by using 
the pooled sample and coq = An(£ m ax) is the maximum 



value of this estimator computed at the last observed fail- 
ure time £ m ax- In our problem, the limiting distribution 
of the proposed statistic where coo is replaced by cbo is 
obtained by using the results of Pierce [14] in the context 
of improper survival distribution [8]. Here, cbo is an effi- 
cient estimator of coo if the upper bound of the domain 
for the survival distribution is less or equal to the upper 
bound of the domain for the censoring distribution [8,14]. 
In practice, this latter condition expresses the fact that the 
uncured patients should experience the event within the 
maximum length of follow-up. This condition is assumed 
to be verified and is required for establishing the limiting 
distribution of the proposed statistic. 
The corresponding information matrix / is such as: 

a 2 logi 



d 2 a 

a 2 logi 

and 

a 2 logi 

dadO 

a 2 logi 

dydO 

a 2 logi 

dady 



A 0 q;) l 2 , 

WO J 



7=1 

n 

7=1 

n r 

7=1 L 

n r 

= x>[ 



a 2 iogi 

d 2 y 



1 + log 



L Ao(^)y 



A 0 ft/) 



[ s* 1 * (o,o,(U) 1 2 _ \ s&Ho,o,o,tj) ' 



[5(°)(0, 



,0M) 



where S (r) (0, 0, 0, t) = n~ l E Yk (tj) G] with r = 0, 1, 2. 
k=i 

The elements of the score vector and of the 
information matrix (Ih 0 ) are computed by using 
efficient estimators of Ao(£/) and coo as given above. 

Finally, the statistic 

%) = (Vh 0 ,ci, VH 0 ,e, VtfoO') (Vh 0 ,ci, Vh 0 ,6, VHo,y^ 

(2) 

is asymptotically distributed under Ho as a chi-square 
with three degrees of freedom. 

Results 

Simulation study 

We conducted a simulation study to evaluate the 
finite-sample performance of the proposed statistic. We 
reported the size of the test as well as the power proper- 
ties of the proposed test (noted S H ° ) together with those 
obtained with the classical Logrank test (noted LR) [10]. 
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We considered a single binary variable taking a value 
of 0 (e.g. absence of a marker) or 1 (e.g. presence of a 
marker) with half of the individuals having value 1. We 
assumed that the survival distribution (for group 0) is 
such as: So(t) = exp[ -w °( 1-e *)]. For group 1, we inves- 
tigated over/under-dispersed scenarios where S\(t) can 
be viewed as a marginal improper survival function with 
either Negative binomial (overdispersion) or Bernoulli 
(underdispersion) distributions for the number of clono- 
gens. For overdispersion (0 > 0), we considered cases 

co-i eY 

(.. „ - e & t \ ^ — 
\ e _Q — ) with the same cure 

fraction (So(oo + ) = Si(oo+)) or different cure fractions 
(«So(oo + ) 7^ 51(00+)) and with/without the same latent 
survival function (Ao(t,a) = Ao(t) = e~ l or A^(t,a) 7^ 
Ao(t)). For underdispersion ( 0 < 0), we considered cases 

such as : S l( t) = ( ^tP ) with the same cure fraction 



or different cure fractions and with/without the same 
latent survival function. 

Various values for the parameters were considered. For 
overdispersed cases, we took 0 = 0.78 and for the under- 
dispersed cases we took 0 = — 1 . For the baseline cure 
rate fraction, we took: S 0 (oo + ) = = 0.30,0.50,0.70. 
The values for co\ are chosen so that the cure fractions are 
equal or different with e Y being equal to: 1 and 1.2. For 
the latent survival distribution shift, we considered val- 
ues e a = 1, 1.25, 1.5. The censoring time Cj was generated 
from an exponential distribution with parameter f . Val- 
ues for f were computed from the chosen percentage of 
censoring and from the parameters of the considered dis- 
tributions. The percentage of censoring below refers only 
to the percentage of censored observations without the 
cure fraction. We investigated no censoring and 30% cen- 
soring. The number of subjects within a group was chosen 
to be 100. For each configuration, 500 replications were 




reference 

over-dispersion: same cure fraction & latent survival 
over-dispersion: same cure fraction & different latent survival 
over-dispersion: different cure fractions & different latent survival 
under-dispersion: same cure fraction & latent survival 
under-dispersion: same cure fraction & different latent survival 
under-dispersion: different cure fractions & different latent survival 



o 



6 



8 



I — 



time 



Figure 1 Theoretical survival curves for seven situations. The reference curve is in black. Survival curves for over-dispersed cases 
(resp. under-dispersed) are in red (resp. in blue). 
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Table 1 Simulation results for overdispersed cases with 30% cure fraction 



Left panel (la) uncensored cases 




Right panel (lb) censored cases 




Over/po = 30% 
cens = 0% 


e y = 1 


eY = 1.2 


Over/po = 30% 
cens = 30% 


eY = 1 


eY = 1 .2 


LR e a = 1 


0.12 


0.57 


LR e a = 1 


0.16 


0.62 


5« 0 e « — i 


0.58 


0.80 




0.47 


0.79 


LR e a = 1 .25 


0.22 


0.69 


/./? e a = 1 .25 


0.29 


0.77 


S H ° e a = 1 .25 


0.87 


0.97 


S H o e a = 1 .25 


0.79 


0.95 


LR e a = ].50 


0.27 


0.76 


LR e a = ].50 


0.42 


0.83 



S w ° e« = 1.50 0.96 0.98 S H ° e a = 1.50 0.90 0.97 



performed and the levels and powers of the two tests were 
estimated at the nominal level 0.05. 

To illustrate these scenarios, we plotted (Figure 1) the 
theoretical marginal survival curves obtained for seven 
situations considering a baseline cure fraction of 50% (i.e. 
£0(00+) = 0.5) . The marginal survival curve for group 
0 (reference curve) is in black. The survival curves for 
over-dispersed cases (0 = 0.78) with same cure fraction 
and latent survival, same cure fraction but different latent 
survival functions (latent survival shift: e a = 1.5) and dif- 
ferent cure fractions (cure fraction shift: e y = 1.2) and 
latent survival functions are in red. The survival curves for 
under-dispersed cases (0 = —1) with same cure fraction 
and latent survival, same cure fraction but different latent 
survival functions (latent survival shift: e a = 1.5 ) and dif- 
ferent cure fractions (cure fraction shift: e y = 1.2) and 
latent survival functions are in blue. 

The estimated levels of the proposed test and the 
logrank test and under the null hypothesis of no sur- 
vival difference between the two groups are within the 
binomial range [ 0.031; 0.069] for either censored cases or 
uncensored cases whatever the level of the cure fraction. 
Tables la, 2a and 3a (resp. Tables lb, 2b and 3b) show 
the results obtained for uncensored (resp. censored) cases 
with overdispersion whereas Tables 4a, 5a and 6a (resp. 
Tables 4b, 5b and 6b) show the results for uncensored 
(resp. censored) cases with underdispersion. 



For uncensored cases, the power gains of the proposed 
test are striking for either differences in cure fraction or 
latent survival distribution. Gains of power of the pro- 
posed test are in decreasing order of the cure fraction. 
In any case, the power of the proposed test is higher of 
those of the logrank test. For the censored case, theses lat- 
ter trends are also noticed. The main difference relative 
to the uncensored case is in the magnitude of the power 
values which are more markedly decreased. In any case, 
the same patterns are observed for the overdispersed and 
underdispersed cases. 

Lung adenocarcinoma example 

In early-stage lung cancer (stage I), surgical resection can 
be considered as effective at eliminating the tumor burden 
for a non-negligeable proportion of patients whereas, for 
the others, it leads to a lower tumor burden and thereby 
prolonged survival. The majority of tumor recurrences 
are detected within two years after the surgical resec- 
tion and the five-year survival following the diagnosis is 
frequently considered as a cure, the main threats being 
other smoking-related diseases such as cardiopulmonary 
disorders. 

The dataset considered in this study is based on 
a homogeneous series of 134 patients with stage IB 
lung adenocarcinomas who underwent surgical resection. 
All specimens underwent pathological review. Here, we 



Table 2 Simulation results for overdispersed cases with 50% cure fraction 



Left panel (2a) uncensored cases 




Right panel (2b) censored cases 




Over/po = 50% 


eY = 1 


eY = 1.2 


Over/po = 50% 


eY = 1 


eY = 1 .2 


cens = 0% 






cens = 30% 






LR e a = 1 


0.07 


0.27 


LR e a = 1 


0.15 


0.38 




0.38 


0.57 




0.28 


0.48 


LR e a = 1 .25 


0.09 


0.35 


LR e a = 1 .25 


0.21 


0.55 


S w ° e a = 1 .25 


0.69 


0.83 


S H o e a = 1 .25 


0.48 


0.69 


LR e a = ].50 


0.08 


0.41 


LR e a = ].50 


0.29 


0.66 


S w « e a = 1 .50 


0.84 


0.94 


s h 0 e a = 1.50 


0.63 


0.83 
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Table 3 Simulation results for overdispersed cases with 70% cure fraction 



Left panel (3a) uncensored cases 






Right panel (3b) censored cases 




Over/po = 70% 


e^ 7 = 1 


ey = 1 .2 


Oer/po = 70% 


e^ = 1 


ey = 1.2 


cens = 0% 








cens = 30% 






LR e a = 1 


0.07 


0.15 


/./? 


e a = 1 


0.12 


0.20 




0.29 


0.33 


5 H ° 


e a = 1 


0.14 


0.27 


LR e a = 1 .25 


0.07 


0.19 


/./? 


e a = 1 .25 


0.14 


0.31 


S H o e a = 1 .25 


0.40 


0.54 




e a = 1 .25 


0.16 


0.39 


/./? e a = 1 .50 


0.06 


0.21 


/./? 


e a = 1.50 


0.21 


0.42 



S H « e a = 1.50 0.64 0.70 S w ° e a = 1.50 0.22 0.48 



Table 4 Simulation results for underdispersed cases with 30% cure fraction 



Left panel (4a) uncensored cases 






Right panel (4b) censored cases 




Under/po = 30% 
cens = 0% 


= 1 


ey = 1 .2 


Under /po = 30% 
cens = 30% 


e y = 1 


ey = 1.2 


LR e a = 1 
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Table 5 Simulation results for underdispersed cases with 50% cure fraction 



Left panel (5a) uncensored cases 






Right panel (5b) censored cases 
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Table 6 Simulation results for underdispersed cases with 70% cure fraction 



Left panel (6a) uncensored cases 






Right panel (6b) censored cases 
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investigated the prognostic impact of three different types 
of markers : genetic (Kras exon 2 mutation), genomic 
(recurrent copy-number losses on genome areas 19pl3.3 
and 19pl3.11) and histopathologic (combined marker: 
necrosis and differentiation). 

We recalled that Kras gene belongs to a gene family 
of small G proteins, anchored on the cytoplasmic side of 
cell membrane, that play a central role in cell signalling 
related to cell proliferation, cell survival and cell motil- 
ity (for a review see [15]). Activating mutations of Kras, 
which lock the protein in the active conformation, have 
been described in numerous epithelial tumors includ- 
ing lung adenocarcinomas. In a previous study ([16]), we 
have identified two recurrent driver copy-number losses 
located on the short arm of chromosome 19 (19pl3.3, 
19pl3.11) that were exclusively deleted in lung adenocar- 
cinomas from western european population (as compared 
with east-asian populations). Their prognostic impact 
have not been previously investigated. The prognostic 
impact of histopathological features of lung adenocarci- 
noma such as necrosis and tumor differentiation has been 
widely debated in the literature but recent studies pointed 
out that patients having tumor with necrosis or solid pat- 
tern (poorly differenciated) have an unfavorable prognosis 
and may be candidate for adjuvant therapy ([17]). Here, we 
investigated the prognostic impact of a simple histopatho- 
logical marker that combines information about necrosis 
and differentiation level (necrosis associated with a poor 
differentiation versus no necrosis or well differentiated). 

All patients were genotyped for Kras mutations. Primers 
(Kras exon 2) were used to amplify the relevant regions 
and DNA sequencing was performed on an ABI3730xl 
Sanger sequencer. All mutations were confirmed by bidi- 
rectional sequencing. In this study, the percentage of Kras 
mutation was 18% (24 cases), 37.6% and 34% displayed 
copy loss on 19pl3.3 and 19pl3.11, respectively, and 23% 
of the tumor samples showed necrosis associated with a 
poor differentiation. The time-to-event (death) was calcu- 
lated from the date of treatment to the time of death or 
last follow-up. Overall survival rates were derived from 
Kaplan-Meier estimates and given with their 95% confi- 
dence intervals. The median of follow-up was of four years 
and we observed thirty sevent events. For the entire pop- 
ulation, overall survival at two years and five years was of 
87.2% [81.5-93.3] and 65.4% [56.3-75.9]. 

When testing for differences in overall survival for Kras 
mutation, the logrank test (LR = 1,2, p = 0.26) was 
not significant in contrast with the proposed test (Sh 0 = 
9.3, p = 0.025). Figure 2 display the Kaplan-Meier esti- 
mates of the survival according to Kras mutation status. 

When testing for differences in overall survival for copy- 
number loss on genomic areas 19pl3.3 and 19pl3.11, 
the logrank test was not significant for the two areas 
(LR 19pl33 = 0.5, p = 0A8;LR 19pl3A1 = hp = 0.33) 



whereas the proposed test showed no difference for 
19pl3.3 (Sh 0 = 4.3,/? = 0.23) but a significant differ- 
ence for 19pl3.11 (S Ho = 8.2,/? = 0.041). Figure 3 display 
the Kaplan-Meier estimates of the survival according to 
copy-number loss on 19pl3.11. 

When testing for differences in overall survival for 
the combined histopathological marker, the logrank test 
(LR = 0.1, p = 0.81) was not significant in contrast with 
the proposed test (Sh 0 = 7.9, p = 0.048). Figure 4 dis- 
play the Kaplan-Meier estimates of the survival according 
to the combined histopathological marker status. 

All the figures show a clear time-varying effect between 
the two curves as time goes on. From a biological perspec- 
tive, the marginal survival distribution observed for the 
Kras positive (activating) mutation, deletion of genomic 
area 19pl3.11 and necrosis/poor differentiation status can 
be interpreted as reflecting molecular changes affecting 
either the tumor burden or the dynamic growth. 

Discussion 

With significant progress in defining homogeneous his- 
tological and clinical group of early-stage cancer patients 
who sustained a same potential curative therapy, the chal- 
lenge is now to find novel molecular markers having capa- 
bility to separate patients according to their time-to-event 
outcome. This problem can be handled by considering 
cure rate models that are specified using either a two- 
component mixture model or bounded cumulative hazard 
approach. 

In this work, a score test is proposed for testing the null 
hypothesis of no survival difference in early-stage of can- 
cer. From a biological point of view, this score test allows 
to detect changes in the cure fraction, the distribution 
of surviving clonogens and the tumor progression. It is 
derived from a flexible model that describes the impact 
of discrete markers on the survival time distribution with 
or without a same cure fraction and stems from bio- 
logical as well as pragmatic statistical considerations. A 
nice feature of the proposed score-type statistic is that 
it can be easily implemented since it does not require 
to estimate the parameters of the cure model under the 
alternative hypothesis. It should be noted that the pro- 
posed procedure can be extended for comparing more 
than two groups with Poisson cure rate model as the 
benchmark model for the reference group. The new alter- 
native hypothesis will be such as there is at least one of the 
groups that differs from the reference one at some time for 
either the distribution of the number of clonogenes or the 
latent (clonogenic) survival functions. 

Simulation results show that striking gains in power can 
be achieved by our proposed test as compared to the clas- 
sical Log-rank test. As the cure rate fraction increases, 
the power of the test decreases, but remains higher than 
that of the logrank test. This latter result is not surprising, 
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Figure 2 Kaplan-Meier curves of the overall survival based on Kras mutation status. 
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Figure 3 Kaplan-Meier curves of the overall survival based on 
copy-number loss of 1 9p1 3.1 1 status. 
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Figure 4 Kaplan-Meier curves of the overall survival based on 
the combined histopathological marker. 
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since increasing the cure fraction reduces the number of 
potential events. In the presence of censoring, the power 
of the proposed test decreases, but remains higher than 
that of the logrank test. It is worth recalling that the 
validity of the proposed score test requires asymptotic effi- 
ciency of cumulative hazard rate estimators which implies 
that the susceptible patients should experience the event 
within the maximum length of follow-up. 

In our homogeneous series of early-stage lung adeno- 
carcinoma presented in this article, the proposed statistic 
is particularly appealing since the majority of the patients 
are amenable to cure. If some lung cancer studies have 
reported a deleterious prognostic effect of Kras muta- 
tion, there is still some debate. In this study, we show a 
significant relationship between overall survival and Kras 
mutation status that would have been overlooked by only 
considering results from the classical logrank test. From a 
biological point of view, one could hypothesize that down- 
stream effectors of Kras mutation have complex biological 
activities affecting either the tumor burden or the dynamic 
growth. Moreover, these results also argue in favor of con- 
sidering combined histopathological marker in prognostic 
studies and give some interesting insights regarding recur- 
rent driver copy-number loss on genomic area 19pl3.11 
that may require future exploration. In further works, it 
could be of interest to estimate the parameters that are 
associated to survival differences. For such purpose, the 
estimation procedure introduced by Tsodikov [18] could 
be envisaged. 

Conclusion 

In summary, detecting molecular markers associated with 
complex survival patterns in early-stage cancer is of 
potential interest for research in enlighting their contri- 
bution to the natural history of tumor disease. We believe 
that our proposed score test statistic is a powerful tool 
for detecting molecular markers associated with complex 
survival patterns. Moreover, it should be noted that this 
test statistic can be applied in any other medical fields for 
which there is the possibility that some patients will not 
experience the event of interest. 
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